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Abstract 

In this paper, a novel steganographic scheme based on chaotic itera- 
tions is proposed. This research work takes place into the information 
hiding security framework. The applications for anonymity and pri- 
vacy through the Internet are regarded too. To guarantee such an 
anonymity, it should be possible to set up a secret communication 
channel into a web page, being both secure and robust. To achieve 
this goal, we propose an information hiding scheme being stego-secure, 
which is the highest level of security in a well defined and studied 
category of attacks called "watermark-only attack". This category of 
attacks is the best context to study steganography-based anonymity 
through the Internet. The steganalysis of our steganographic process 
is also studied in order to show it security in a real test framework. 

Key Words: Anonymity; Privacy; Internet; Information hiding; Steganog- 
raphy; Security; Chaotic iterations. 



1 Introduction 

In common opinion or for non specialists, anonymity through the Internet is 
only desirable for malicious use. A frequent thought is that individuals who 
search or use anonymity tools have something wrong or shameful to hide. 
Thus, as privacy and anonymity software as proxy or Tor |29[[6] are only used 
by terrorists, pedophiles, weapon merchants, and so on, such tools should 
be forbidden. However, terrorism or pedophilia existed in the absence of the 



Internet. Furthermore, recent actualities recall to us that, in numerous places 
around the world, to have an opinion that diverges from the one imposed by 
political or religious leaders is something considered as negative, suspicious, 
or illegal. For instance, Saudi blogger Hamza Kashgari jailed, may face 
execution after tweets about Muhammad. Generally speaking, the so-called 
Arab Spring, and current fighting and uncertainty in Syria, have taught to us 
the following facts. First, the Internet is a media of major importance, which 
is difficult to arrest or to silence, bearing witness to the need for democracy, 
transparency, and efforts to combat corruption. Second, claiming his/her 
opinions, making journalism or politics, is dangerous in various states, and 
can lead to the death penalty (as for numerous Iranian bloggers: Hossein 
Derakhshan |27| . Vahid Asghari [2E|, etc.). 

Considering that the freedom of expression is a fundamental right that 
must be protected, that journalists must be able to inform the community 
without risking their own lives, and that to be a defender of human rights can 
be dangerous, various software have emerged these last decades to preserve 
anonymity or privacy through the Internet. The most famous tool of this 
kind is probably Tor, the onion router. Tor client software routes Internet 
traffic through a worldwide volunteer network of servers, in order to conceal 
an user's location or usage from anyone conducting network surveillance or 
traffic analysis. Another example of this kind is given by Perseus j8j, a firefox 
plugin [9j that protect personal data, without infringing any national crypto 
regulations, and that preserve the true needs of national security. Perseus 
replaces cryptography by coding theory techniques, such that only agencies 
with a strong enough computer power can eavesdrop traffic in an acceptable 
amount of time. Finally, anonymous proxy servers around the world can 
help to keep machines behind them anonymous: the destination server (the 
server that ultimately satisfies the web request) receives requests from the 
anonymizing proxy server, and thus does not receive information about the 
end user's address. 

These three solutions are not without flaws. For instance, when consider- 
ing anonymizers, the requests are not anonymous to the anonymizing proxy 
server, which simply moves the problem on: are these proxy servers worthy 
of trust? Perseus can be broken with enough computer power. And due to 
its central position and particular conception, Tor is targeted by numerous 
attacks and presents various weakness (bad apple attack, or the fact that 
Tor cannot protect against monitoring of traffic at the boundaries of the Tor 
network) . 

Considering these flaws, and because having a variety of solutions to 
provide anonymity is a good rule of thumb, a steganographic approach is 
often regarded in that context [12J. Steganography can be applied in several 
ways to preserve anonymity through the Internet, encompassing the creation 
of secret channels through background images of websites, into Facebook 
photo galleries, on audio or video streams, or in non-interpreted characters in 
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HTML source codes. The authors' intention is not to describe precisely these 
well-known techniques, but to explain how to evaluate their security. They 
applied it on a new algorithm of steganography based on chaotic iterations 
and data embedding in least significant coefficients. This state-of-the-art in 
information hiding security is organized as follows. 

In Section [2] some basic reminders concerning both mathematical notions 
and notations, and the Most and Least Significant Coefficients are given. 
Our new steganographic process called PX3 which is suitable to guarantee 
anonymity of data for privacy on the Internet is presented in Section [3] In 
Section [4] a reminder about information hiding security is realized. The 
attacks classification in a steganographic framework are given, and the level 
of security of is studied. In the next section the security of our new 
scheme is evaluated. Then, in Section- [6] the steganalysis of the proposed 
process is realized, and it is compared with other steganographic schemes in 
the literature. This research work ends by a conclusion section, where our 
contribution is summarized and intended future researches are presented. 

2 Basic Reminders 

2.1 Mathematical definitions and notations 

Let S n denotes the n th term of a sequence S, and Vi the i th component of a 
vector V. For a, b G IN, we use the following notation: [a; 6J = {a, a + 1, a + 
2,..., b}. 

Definition 1: Let k G IN*. The set of all sequences which elements belong 
into [1; k], called strategy adapters on [1; k], is denoted by S|<. □ 

Definition 2: The support of a finite sequence S of n terms is the finite set 
<y{S) = {S k ,k < n} containing all the distinct values of S. Its cardinality 
is s.t. #S*(S) ^ n. □ 

Definition 3: A finite sequence S G $n of n terms is injective if n = 
^5^{S). It is onto if N = #5^{S). Finally, it is bijective if and only if 
it is both injective and onto, so n = N = ^5^{S). □ 

Remark 1: On the one hand, "S is injective" reflects the fact that all the n 
terms of the sequence S are distinct. On the other hand, "S is onto" means 
that all the values of the set [1; NJ are reached at least once. □ 

2.2 The Most and Least Significant Coefficients 

We first notice that terms of the original content x that may be replaced by 
terms issued from the watermark y are less important than other: they could 
be changed without be perceived as such. More generally, a signification 
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function attaches a weight to each term defining a digital media, depending 
on its position t. 

Definition 4: A signification function is a real sequence (tt fc ) fceIN . □ 

Example 1: Let us consider a set of grayscale images stored into portable 
graymap format (P3-PGM): each pixel ranges between 256 gray levels, i.e., is 
memorized with eight bits. In that context, we consider u k = 8 — (k mod 8) 
to be the k-th term of a signification function (u fc ) fceIN . Intuitively, in each 
group of eight bits (i.e., for each pixel) the first bit has an importance equal 
to 8, whereas the last bit has an importance equal to 1. This is compliant 
with the idea that changing the first bit affects more the image than changing 
the last one. □ 

Definition 5: Let (u fc ) fceIN be a signification function, m and M be two reals 
s.t. m < M. 

• The most significant coefficients (MSCs) of x is the finite vector 

um = (k | k G W and u k M and k <| x |^ ; 

• The least significant coefficients (LSCs) of x is the finite vector 

Um = (k | k G IN and u k < m and k <| x |^ ; 

• The passive coefficients of x is the finite vector 

u p = (k | k G IN" and u k G]m; M[ and k <\ x \ j . 

For a given host content x, MSCs are then ranks of x that describe the 
relevant part of the image, whereas LSCs translate its less significant parts. 

Example 2: These two definitions are illustrated on Figure [7| where the 
significance function (u k ) is defined as in ExamplelA M = 5, and m = 6. 



3 The new Process: 7^X3 

In this section, a new algorithm, which is inspired from the scheme CIS2 
described in is presented. It is easyer to implement for Internet appli- 
cations, especially in order to guarantee anonymization. Moreover, this new 
scheme DI3 seems to be faster than CIS 2, which is a major advantage to 
have fast response times on the Internet. 

Let us firstly introduce the following notations. P G IN* is the width, 
in term of bits, of the message to embed into the cover media. A G IN* is 
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(a) Original Lena. 




(b) MSCs of Lena. (c) LSCs of Lena (xl7). 

Figure 1: Most and least significant coefficients of Lena. 



the number of iterations to realize, which is s.t. A > P. x E B is for 
the N LSCs of a given cover media C supposed to be uniformly distributed. 
m E B p is the message to hide into x°. Finally, S E Sp is a strategy such 
that the finite sequence {S , k E [A — P + 1; A]} is injective. 

Remark 2: The width P of the message to hide into the LSCs of the cover 
media x° has to be far smaller than the number of LSCs. □ 

The proposed information hiding scheme is defined by: 

Definition 6 (X>X 3 Data hiding scheme): V(n, E W x [0; N - 1] x 

[0;P-1]; 

* \ m S n if S n = i. 

The stego-content is the Boolean vector y = x x E B N , which will replace 
the former LSCs (LSCs of the cover media are replaced by the vector y) . 



4 Data Hiding Security and Robustness 
4.1 Security and robustness 

Even if security and robustness are neighboring concepts without clearly 
established definitions |21j . robustness is often considered to be mostly con- 
cerned with blind elementary attacks, whereas security is not limited to 
certain specific attacks. Indeed, security encompasses robustness and in- 
tentional attacks |1 6|. |7j. The best attempt to give an elegant and concise 
definition for each of these two terms was proposed in |16J . Following Kalker, 
we will consider in this research work the two following definitions: 




Figure 2: Simmons' prisoner problem |24| 

Definition 7 (Security [16J ) : Watermarking security refers to the inabil- 
ity by unauthorized users to have access to the raw watermarking channel 
[...] to remove, detect and estimate, write or modify the raw watermarking 
bits. □ 

Definition 8 (Robustness [16J ) : Robust watermarking is a mechanism to 
create a communication channel that is multiplexed into original content [...] 
It is required that, firstly, the perceptual degradation of the marked content 
[...] is minimal and, secondly, that the capacity of the watermark channel 
degrades as a smooth function of the degradation of the marked content. □ 

In this article, we will focus more specifically on the security aspects, 
which have been formalized in the Simmons' prisoner problem. 

4.2 The prisoner problem 

In the prisoner problem of Simmons |24j . Alice and Bob are in jail, and 
they want to, possibly, devise an escape plan by exchanging hidden mes- 
sages in innocent- looking cover contents (Fig. [2]). These messages are to be 
conveyed to one another by a common warden, Eve, who over-drops all con- 
tents and can choose to interrupt the communication if they appear to be 
stego-contents. 

4.3 Classification of Attacks 

In the steganography framework, in the Simmons' prisoner problem context, 
attacks have been classified in [5] as follows. 

Definition 9 (Classes of attacks): 

WO A: A Watermark-Only Attack occurs when an attacker has only access 
to several watermarked contents. 

KM A: A Known-Message Attack occurs when an attacker has access to sev- 
eral pairs of watermarked contents and corresponding hidden messages. 

KOA: A Known-Original Attack is when an attacker has access to several 
pairs of watermarked contents and their corresponding original ver- 
sions. 
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CMA: A Constant-Message Attack occurs when the attacker observes sev- 
eral watermarked contents and only knows that the unknown hidden 
message is the same in all contents. □ 



A synthesis of this classification is given in Table [T] 



Class 


Original content 


St ego content 


Hidden message 


WOA 




X 




KMA 




X 


X 


KOA 


X 


X 




CMA 






X 



Table 1: Watermarking attacks classification in context of |16] 



In this article, we will focus more specifically on the "Watermark-Only 
Attack" situation, which is the most relevant category when considering 
anonymity and privacy protection through the Internet. 

4.4 Reminder about Stego-Security 

The stego-security, defined in the Watermark- Only Attack (WOA) frame- 
work, is the highest security level that can be defined in this setup |5]. 

Definition 10 (Stego-Security): Let K be the set of embedding keys, 
p(X) the probabilistic model of Nq initial host contents, and p(Y\Ki) the 
probabilistic model of No watermarked contents. Moreover, each host content 
has been watermarked with the same secret key K\ and the same embedding 
function e. Then e is said stego-secure if: 

VK 1 €K,p(Y\K 1 )=p(X). 

Until now, only three schemes have been proven stego-secure. On the one 
hand, the authors of [5] have established that the spread spectrum technique 
called Natural Watermarking is stego-secure when its distortion parameter 
r) is equal to 1. On the other hand, we have proposed in |13| and two 
other data hiding schemes satisfying this security property. 

5 Security Study 

Let us prove that, 

Proposition 1: DT% is stego-secure. □ 
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Proof Let us suppose that x° ~ U (JB N ), m ~ U (B p ) ; and S ~ U (S P ) m 
a DXj, setup, where U(X) describes the uniform distribution on X. We will 
prove by a mathematical induction that "in El,i"~U (B N ). The base case 
is obvious according to the uniform repartition hypothesis. 

Let us now suppose that the statement x n ~ U (B N ) holds for some n ( 
P (x n = fa) = -^w)- 

For a given k G B^, we denote by fa G B^ the vector defined by: 
Mi G [0; N - 1], if k = {ko, fa, . . . , fa, . . . , fc N -2, kn-i), 

then fa = [fa, fa, . . . , fa, . . . , &N-2) ^N-i); where x is the negation of the bit 
x. 

Let p be defined by: p = P (x n+1 = fa) . Let Ej and E be the events 
defined by: Vj G [0; P - l\,Ej = (x n = kj) A (S n = j) A (m S n = kj),E = 
= k) A (m 5 n = xs»). So, p = P (E V Vjto 1 • 

On f/ie one Zianc?, Vj G [0; P — 1], f/ie ei>eni £j is a conjunction of the 
sub-events (S n = j) and other sub-events. Vj G [0; P — 1], all the sub-events 
(S n = j) are clearly pairwise disjoints, so all the evente Ej are pairwise 
disjoints too. 

On the other hand, Vj G [0; P — 1], the events Ej and E are disjoints, 
because in Ej, a conjunction of the sub-event (x n = kj) with other sub-events 
appears, whereas in E a conjunction of the sub-event {x n = k) with other 
sub-events appears, and the two sub-events (x n = kj) and (x n = k) are 
clearly disjoints. 

As a consequence, using the probability law concerning the reunion of 
disjoint events we can claim that: p = P(E) + Y2f=o 
Now we evaluate both P(E) and P(Ej). 

1. The case of P(E): As the two events (x n = k) and (ms^ = xs™) 
concern two different sequences, they are clearly independent. 

Then, by using the inductive hypothesis: P(x n = k) = So, 

p(E) = P(x n = k) x P(m S n = x S n) 
= ± x [P{m S n = 0)P(x S n = 0) 

+ P(m S n = l)P{ XS n = 1)] 

= ?x [P(m sn = 0)P(x s ™ = 0) 

+P(ms» = l)(l-P(xsn=0))] 
= ^x[ip(x s » = 0) + i(l-P(x S n=0))] 

= 2^+1- 

2. Evaluation of P(Ej): As the three events (x n = kj), (S n = j), and 
(m n = kj) deal with three different sequences, they are clearly indepen- 



8 



dent. So 



due to the hypothesis of uniform repartition of S and m. 

Consequently, p = P(E) + Sj=o P(^j) 

= 2^+1 ^j=0 (p X 

Finally, P (x n+1 = k) = ^, which leads to x n+1 ~ U (B N ). This result 
is true Vn G M, we thus have proven that the stego-content y is uniformly 
distributed in the set of possible stego- contents: y ~ U (B N ) when x ~ 
U(B N ). □ 

Remark 3 (Distribution of LSCs): We have supposed that x° ~ U (B N ) 
to prove the stego- security of the data hiding process TJI3. This hypothesis 
is the most restrictive one, but it can be obtained at least partially in two 
possible manners. Either a channel that appears to be random (for instance, 
when applying a chi squared test) can be found in the media. Or a systematic 
process can be applied on the images to obtain this uniformity, as follows. Be- 
fore embedding the hidden message, all the original LSCs must be replaced by 
randomly generated ones, hoping so that such cover media will be considered 
to be noisy by any given attacker. 

Let us remark that, in the field of data anonymity for privacy on the 
Internet, we are in the "watermark-only attack" framework. As it has been 
recalled in Table [ij in that framework, the attacker has only access to stego- 
contents, having so no knowledge of the original media, before introducing 
the message in the random channel (LSCs). However, this assumption of 
the existence of a random channel, natural or artificial, into the cover im- 
ages, is clearly the most disputable one of this research work. The authors' 
intention is to investigate such hypothesis more largely in future works, by 
investigation the distribution of several LSCs contained in a large variety of 
images chosen randomly on the Internet. Among other things, we will check 
if some well-defined LSCs are naturally uniformly distributed in most cases. 
To conduct such studies, we intend to use the well-known NIST (National In- 
stitute of Standards and Technology of the U.S. Government) tests suite \23§ . 
the DieHARD [20] battery, or the stringent TestUOl 113/- Depending on the 
results of this search for randomness in natural images, the need to introduce 
an artificial random channel could be possibly removed. □ 

Remark 4 (Distribution of the messages m): In order to prove the stego- 
security of the data hiding process TJI3, we have supposed that m ~ U (B p ) . 



P(x n = kj) x P{S n = 3) x P(m S n = kj) 
i x i x i 

oN p 2 

iv J_ 
p A 2 N +! ' 
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This hypothesis is not really restrictive. Indeed, to encrypt the message be- 
fore its embedding into the LSCs of cover media if sufficient to achieve this 
goal. To say it different, in order to be in the conditions of applications of 
the process T)T^, the hidden message must be encrypted. □ 

Remark 5 (Distribution of the strategies S): To prove the stego- security 
of the data hiding process TJI3, we have finally supposed that S ~ U (Sp). 
This hypothesis is not restrictive too, as any cryptographically secure pseudo- 
random generator (PRNG) satisfies this property. With such PRNGs, it is 
impossible in polynomial time, to make the distinction between random num- 
bers and numbers provided by these generators. For instance, Blum Blum 
Shub (BBS) HSi, Blum Goldwasser (BG) or ISAAC |Xf ; are conve- 

nient here. □ 



6 Steganalysis 

The steganographic scheme detailed along these lines has been compared to 
state of the art steganographic approaches, namely YASS [25J, HUGO |22j . 
and nsF5 [TO]. 

The steganalysis is based on the BOSS image database [4] which consists 
in a set of 10 000 512x512 greyscale images. We randomly selected 50 of them 
to compute the cover set. Since YASS and nsF5 are dedicated to JPEG 
support, all these images have been firstly translated into JPEG format 
thanks to the mogrify command line. To allow the comparison between 
steganographic schemes, the relative payload is always set with 0.1 bit per 
pixel. Under that constrain, the embedded message m is a sequence of 26214 
randomly generated bits. This step has led to distinguish four sets of stego 
contents, one for each steganographic approach. 

Next we use the steganalysis tool developed by the HugoBreakers team |17[ 
IT8] based on AI classifier and which won the BOSS competition [4j. Table [2] 
summarizes these steganalysis results expressed as the error probabilities 
of the steganalyser. The errors are the mean of the false alarms and of 
the missed detections. An error that is closed to 0.5 signifies that deciding 
whether an image contains a stego content is a random choice for the ste- 
ganalyser. Conversely, a tiny error denotes that the steganalyser can easily 
classify stego content and non stego content. 



Steganographic Tool 


VZi 


YASS 


HUGO 


NsF5 


Error Probability 


0.4133 


0.0067 


0.495 


0.47 



Table 2: Steganalysis results of HugoBreakers steganalyser applied on 
steganographic scheme 
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The best result is obtained by HUGO, which is closed to the perfect 
steganographic approach to the considered steganalyser, since the error is 
about 0.5. However, even if the approach detailed along these lines has not 
any optimization, these first experiments show promising results. We finally 
notice that the HugoBreakers's steganalyser should outperform these results 
on larger image databases, e.g., when applied on the whole BOSS image 
database. 

7 Conclusion and Future Work 

Steganography is a real alternative to guarantee anonymity through the In- 
ternet. For instance, the scheme presented in this article offers a secure 
solution to achieve this goal, thanks to its stego-security. Even if this new 
scheme DI3 does not possess topological properties (unlike the CIS 2), its 
level of security seems to be sufficient for Internet applications. Indeed, 
we take place into the Watermark Only Attack (WOA ) framework, where 
stego-security is the highest level of security. Additionally, this new scheme 
is faster than CIS 2. This is a major advantage for an utilization through 
the Internet, to respect response times of web sites. Moreover, for this first 
version of the process, the steganalysis results are promising. 

In future work, various improvements of this scheme are planed to ob- 
tain better scores against steganalysers. For instance, LSCs will be embed- 
ded into various frequency domains. The robustness of the proposed scheme 
will be evaluated too [1] , to determine whether this information hiding algo- 
rithm can be relevant in other Internet domains interesting by data hiding 
techniques, as the semantic web. Finally a cryptographic approach of in- 
formation hiding security is currently investigated, enlarging the Simmons' 
prisoner problem [3], and we intend to evaluate the proposed scheme in this 
framework. 
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